Lexical Selection for Hybrid MT with Sequence Labeling
نویسندگان
چکیده
We present initial work on an inexpensive approach for building largevocabulary lexical selection modules for hybrid RBMT systems by framing lexical selection as a sequence labeling problem. We submit that Maximum Entropy Markov Models (MEMMs) are a sensible formalism for this problem, due to their ability to take into account many features of the source text, and show how we can build a combination MEMM/HMM system that allows MT system implementors flexibility regarding which words have their lexical choices modeled with classifiers. We present initial results showing successful use of this system both in translating English to Spanish and Spanish to Guarani.
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملDeep Grammars in a Tree Labeling Approach to Syntax-based Statistical Machine Translation
In this paper, we propose a new syntaxbased machine translation (MT) approach based on reducing the MT task to a treelabeling task, which is further decomposed into a sequence of simple decisions for which discriminative classifiers can be trained. The approach is very flexible and we believe that it is particularly well-suited for exploiting the linguistic knowledge encoded in deep grammars wh...
متن کاملTowards the Automatic Acquisition of Lexical Selection Rules
This paper is a study of a certain type of collocations and implication and application to acquisition of lexical selection rules in transfer-approach MT systems. Collocations reveal the co-occurrence possibilities of linguistic units in one language, which often require lexical selection rules to enhance the natural flow and clarity of MT output. The study presents an automatic acquisition and...
متن کاملVerb Semantics and Lexical Selection
This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT). Two groups of English and Chinese verbs are examined to show that lexical selection must be based on interpretation of the sentence as well as selection restrictions placed on the verb arguments. A novel representation scheme is suggested, a...
متن کاملTHE JOHNS HOPKINS UNIVERSITY Sub-Lexical and Contextual Modeling of Out-of-Vocabulary Words in Speech Recognition
Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of subword units. We present a novel probabilistic model to l...
متن کامل